Int. J. Exp. Res. Rev., Vol. 42: 320-327 (2024) 
Open Access 


International Journal of Experimental Research and Review (IJERR) MM TTT 
© Copyright by International Academic Publishing House ([APH) 

ISSN: 2455-4855 (Online) 

www. iaph.in 9 


DOI: https://doi.org/10.52756/ijerr.2024.v42.028 
Original Article 


Peer Reviewed 


72455°485008 


Machine Learning Techniques for Medicinal Leaf Prediction and Disease Identification 


Bodduru Keerthana’, J Vamsinath”, Ch Sita Kumari’, Setti Vidya Sagar Appaji’, P Pratima Rani* and 


Satyanarayana Chilukuri® 
® Check for updates 


‘Department of Information Technology, Anil Neerukonda Institute of Technology and Sciences (A), Visakhapatnam, 
Andhra Pradesh, India; Department of Computer Science and Engineering, ICFAI University, Hyderabad, India; 
*Department of Computer Science and Engineering, Gayatri Vidya Parishad College of Engineering(A), 
Visakhapatnam, Andhra Pradesh, India; “Department of Artificial Intelligence and Machine Learning, Raghu 
Engineering College (A), Visakhapatnam, Andhra Pradesh, India; “Department of Computer Science and Engineering, 
Vignan’s Institute of Information Technology(A), Visakhapatnam, Andhra Pradesh, India; “Department of 


Mathematics, BV Raju College (A), Bhimavaram, Andhra Pradesh, India 
E-mail/Orcid Id: 


BK, ® keerthanabodduru@ gmail.com, © https://orcid.org/0009-0000-6820-514X; JV, © vamsi.img @ gmail.com, © https://orcid.org/0000-0002-2062-0907; 
CHSK, ® sitha_kumari @ gvpce.ac.in, © https://orcid.org/0000-0002-2761-4200; SVSA, © vidyasagar.setti@raghuenggcollege.in, © https://orcid.org/0000- 
0002-8221-8251; PPR, @ prathimaranipalla@ gmail.com, © https://orcid.org/0009-0005- 1439-5917; SC, 2] satyanarayana.ch @bvricedegree.edu.in, © 
https://orcid.org/0009-0001-0489-2064 


Article History: 
Received: 27" May, 2024 
Accepted: 17" Aug., 2024 
Published: 30" Aug., 2024 


Abstract: Trees have been a crucial component in humans' lives for hundreds of years, 
providing food, shelter, and medicine. Some trees have a lot of medicinal properties that 
cure many diseases. In the old days, Ayurvedic methods were popular for various 
treatments, but nowadays, the demand for foreign medicine is increasing gradually, 
which also has side effects. This paper addresses this issue by deciding on the medical 
conditions corresponding to a symptom and predicting an herb leaf that can be treated it 
using some modern machine learning techniques. We have used three machine learning 
methods to accomplish this goal: Multinomial Naive Bayes, Gradient Boosting and 
Random Forest. These techniques were then used to assess the symptoms and decide the 
name of the disease and which leaf is appropriate for medicine. The highest accuracy 
(92%) was produced by the Multinomial Naive Bayes algorithm, thereby showing its 
capability to predict the right medicinal leaf based on given symptoms. The results show 
that machine learning algorithms, especially Multinomial Naive Bayes, can identify 
diseases and recommend suitable medicinal leaves. This approach holds promise for 
integrating traditional Ayurvedic knowledge with modern technology to offer 
alternative treatments with potentially fewer side effects. 
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Introduction 

A growing number of plants are being employed as 
medications thanks to our evolving interaction with them. 
As medical knowledge expanded rapidly, so did the 
availability of medicines derived from plants. The 
medicinal applications of plants mentioned in the Indian 
Vedas Traditional medical practices are being used today 
(Shi et al., 2021). Traditional medical practices based on 
plants are still very important in today's medical 
landscape (Rao et al., 2023). Herbal medicine, health 
goods, medicines, food supplements, nutraceuticals, and 


cosmetics are all experiencing rising demand. In the 21st 
century, natural products have accounted for more than 
50% of all medications used in clinical practice. 

The treatment of mental illness, skin illnesses, TB, 
diabetes, jaundice, high blood pressure, and cancer all 
make use of medications produced from plants (Sarris et 
al., 2021). In this study, we are going to determine which 
type of leaf can be utilized to treat human ailments in a 
specific manner (Rao et al., 2023). Prevention of Illnesses 
Machine learning is a method for predicting illness by 
evaluating the user's symptom data and arriving at a 
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diagnosis. It also accurately predicts the user's illness 
based on the symptoms and information entered into the 
system. This is achieved using gradient boosting, random 
forest, and multinomial naive bayes algorithms to 
forecast disease based on symptoms. These algorithms 
are also used to predict the name of the leaves based on 
the condition. A proposed model can predict the sickness 
by considering a wide range of symptoms as input. 
Classifiers based on gradient-boosting decision trees, 
multinomial naive Bayes, and random forests are utilized 
in the recommended procedure for illness prediction. The 
mode of all of these machine learning models will be the 
final outcome. 

Existed System 

# Many different systems are currently being used for 
disease forecasting. Only disease predictions can be made 
using the current systems. 

# The system cannot recommend leaves to treat the 
anticipated sickness. 

# Machine learning and a_ supervised learning 
algorithm, which require training data annotated with 
labels to provide accurate predictions, are commonly 
used in conventional disease risk models (Uddin et al., 
2019). 

Proposed System 

# The suggested system uses machine learning to 
construct a disease prediction model. This algorithm is 
selected from the following options: the Random Forest 
algorithm, the Gradient boost algorithm, the Multinomial 
algorithm, and the naive Bayes algorithm. 

# The user or patient can provide a minimum of two 
symptoms and a maximum of four symptoms to 
anticipate the output based on the input symptoms. 

# The name of the leaf, which indicates which 
diseases are treatable with certain leaves, can be used to 
predict the ailment and the corresponding medicine. 

# The leaves that can be used to treat a human disease 
are predicted by our method after the disease has been 
predicted. 

Literature Survey 

Integrating machine learning (ML) and deep learning 
(DL) based methods improves medicinal leaf prediction 
and disease identification, which can be used as a 
potential breakthrough for addressing many of the 
the 
medicine field. Appropriate identification of medicinal 


agriculture challenges concerning traditional 
plants is a prerequisite for employing them in traditional 
medicine, and modern computer vision techniques, 
together with machine learning (ML), have enabled the 
evolution of smart systems for this purpose. For example, 


the Medicinal Plant Leaf Identification System uses 
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computer vision to extract a wealth of features from leaf 
images that are then analyzed by ML algorithms to 
accurately classify medicinal plants (Patil et al., 2023). 
Convolutional Neural Networks (CNNs) have been 
remarkably effective in this domain. The proposed way 
extracts and combines the textural features of the leaf 
images taken by a cell phone camera to improve plant 
identification accuracy with an instance classification 
method using CNN classifiers (S et al., 2024). 

Fatima, M. et al. (2017) proposed a study that 
discusses the methodologies used for machine learning in 
the diagnosis of diseases such as hepatitis, dengue, 
diabetes, liver disease, and heart disease. Due to their 
effective attribute recognition, many algorithms have 
proven their worth. Using SVM increases the accuracy of 
heart disease detection by 94.60%, according to previous 
studies. Naive Bayes is effective in diagnosing diabetes. 
With a 95% accuracy rate in categorization, it is the best 
there is. The diagnostic accuracy of FT for liver illness is 
97.10 percent. Using RS theory, we can detect dengue 
fever with a perfect rate of success. The feed-forward 
neural network successfully diagnoses hepatitis with 98% 
precision. 

A disease prediction system based on machine 
learning algorithms in general is presented by D. 
Dahiwade et al. (2019). The rapid growth of medical data 
today requires us to use KNN and CNN algorithms to 
categorize patient data, which will correctly predict the 
disease based on the symptoms presented. It allowed 
them to model a sick or well state accurately if the arrival 
had patient record data as an input, so we now have some 
idea of how much such generalized risk could be 
accurate. This approach could speed up and require fewer 
supplies for illness prediction as well as risk evaluation. 
Compared with the results of the KNN algorithm, it was 
discovered that letting the CNN method classify for data 
can get better classification performance in a shorter 
processing time. Consequently, they concluded that CNN 
was better than KNN in terms of efficiency and precision 
having completed the project. 

Raj and Masood (2020) suggested a range of ML and 
DL methods applied to autism spectrum disorder 
diagnosis. In order to test the models trained in a non- 
clinical context, we utilised multiple performance 
evaluation metrics for children, adolescents and adults 
with ASD. Furthermore, the results obtained from the 
multiclass CNN classifier were compared with another 
recent work of stock prediction based on support vector 
machine (SVM) and it was found that after handling 
missing values in our data set, the performance of SVM 
can be exceeded by using the proposed system. Both the 
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SVM and CNN-based models 
comparable prediction accuracy of around 98.30% when 
missing values are handled for the ASD child dataset. 


achieve _ virtually 


Recent evolutionary research on aloes' plant uses and 
leaf succulence has provided new information about the 
plant's astounding commercial dominance, according to 
Grace et al. (2015). Perhaps the plant's widespread 
acclaim in the industry might be attributed to its early 
introduction to trade and cultivation, its closeness to 
important old trade routes, or both. Medicinal usage of 
Aloes is most strongly correlated with the presence of 
mature succulent leaf mesophyll tissue, whereas declines 
in medicinal use are often associated with evolutionary 
losses of succulence. It appears that the genus Aloe was 
partially successful in its environment because of its 
succulent leaves, which have well-developed mesophyll 
tissue. Phylogenetic analyses of plant use shed light on 
the significance of plant diversity on a worldwide scale. 

The first step to purify the crude ethanolic extract of 
S. grandiflora leaves is separating the different 
compounds according its solubilities by Nafisa et al. 
(2016). Some of the solvents used are ethyl acetate, 
petroleum ether, carbon tetrachloride chloroform and 
water. Researchers searched for antibacterial, anti- 
inflammatory membrane stabilizing, and antidiarrial 
effects in the extract. The drugs that served as standards 
were those for which the respective mode of action had 
already been clarified: for example, thrombolysis by 
streptokinase (platelet clearing), maintenance membrane 
stability improved with acetylsalicylic acid kanamycin 
produced a prophylactic effect in case of infection and 
loperamide was used to improve modulatory therapy. 

Using a patient's symptoms, age and _ gender 
Manikanta et al. (2019) proposed methods to forecast the 
disease. Diseases could be predicted with a 93.5% degree 
of certainty utilizing the aforementioned criteria and the 
weighted KNN model. The accuracy values returned by 
the ML models were generally high. Some models failed 
to predict the disease and had a low accuracy rate because 
they relied too much on parameters. If we could forecast 
diseases, we could more efficiently allocate the drugs 
needed to treat them by Keniya, Rinkal et al., (2020). 
Using this technique would reduce the financial burden of 
treating the condition and speed up the healing process. 


Materials and Methods 

Data Collection: Data preparation is the initial stage 
in tackling any machine learning problem. Kaggle data 
and our own custom data will be used in this study. One 
CSV(Comma Separated Values) file is used for training, 
while the other is used for testing; the second CSV file 
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has three columns: disease, leave name, and scientific 
name. 

Cleaning the Data: In machine learning, cleaning is 
the first and foremost requirement. Our machine-learning 
algorithm can only produce accurate results if we use 
high-quality data. Therefore, data cleansing is always 
required before using it to train a model. All of our data's 
columns contain numbers, making them ideal for use as a 
target column. 

Normalization: Symptom data were normalized to 
contribute equally to all features of prediction models. 

Feature encoding: (like 
symptoms and disease names) were encoded using one- 
hot encoding techniques to be compatible with machine 


Categorical variables 


learning algorithms. 

Model Building: In order to train a machine learning 
model, data must first be gathered and cleaned. The 
cleansed data will train decision trees, Multinomial Naive 
Bayes Classifiers (MNB), Gradient Boosting, and 
Random Forest Classifiers. The models' accuracy will be 
evaluated using a confusion matrix. 

Multinomial Naive Bayes Algorithm: Multinomial 
Naive Bayes is a subset of the Naive Bayes method used 
when the features being considered are nominal by Rao et 
al. (2022). Applications of this algorithm include spam 
filtering, sentiment analysis, and subject classification. 
An enormous amount of training data is needed to 
estimate the probability of the characteristics in each 
class reliably. 

P(B/A)P(A 
P(A/B) = “OLR (1) 

The Naive Bayes algorithm in machine learning has 
many variants, and one of the most helpful is 
Multinomial Naive Bayes, which can be used for a 
dataset with a multinomial distribution by Rao et al. 
(2023). When there are several classes into which to sort 
data, this method can be useful because, to determine 
what the text's label will be, it first determines the 
likelihood of each label for the input text and then 
produces the label with the highest probability. 


Figure 1. Multinomial Naive Bayes Model. 
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Gradient Boosting Algorithm: It is possible to use 
the robust machine learning approach called Gradient 
Boosting to address classification and regression issues. It 
is a form of ensemble approach that creates a strong 
learner by combining numerous weak learners, which are 
often decision trees. 


Ji = (x; ) = y k= fay, S eF# 
k=l (2) 
F = f(x) = Waxy (qi R™ >T,weR') 
(3) 


rN 
LQ) = X14 Yi) + YQCfe) 
k (4) 
Where L is the loss function to find the difference 


A 


between the predicted value »/ and the target value yi 


Q represents the complexity of the model (Mishra et al., 
2020). The capacity of Gradient Boosting to handle 
complex interactions between variables and_ to 
automatically accommodate missing values is one of its 
most appealing features proposed by Salehi et al. (2019) 
and Parnami et al. (2018). In natural language processing, 
fraud detection, and predictive modeling, gradient 
boosting is a technique that is frequently utilized. 

Random Forest algorithm: Random Forest, a 
popular machine learning technique, falls under the 
umbrella term of supervised learning. Application to 
machine learning problems such as classification and 
regression is encouraged. 


Gini Index=1- 5 (P,)* =1-[(P,)? +(P_)7] 
i=l (5) 

The Random Forest classifier averages the results of 
numerous decision trees trained on various subsets of the 
input data. As a result, the dataset's anticipated accuracy 
improves instead of relying on only one set of rules or 
decision trees. The random forest takes in all of the 
predictions from the trees, counts up the votes, and then 
produces its own prediction based on the most popular 
answers. 


System Architecture: 

The above Figure.4 is a First of all, we collect 
symptoms from users or patients. The patients are 
possible to give minimum of 2 and maximum of 4 
symptoms. The system predicts disease by applying ML 
algorithms by Rao et al. (2023). Here, we used 3 
algorithms: Random Forest, Multinomial Naive Bayes, 
and Gradient Boosting. To predict the medicinal leaf and 
identify disease, we need two datasets where one is 
collected from Kaggle and another one is prepared by us 
using Ayurvedic books. We predicted the medicinal leaf 
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to cure the disease using the disease-identified and 
prepared dataset. 


Result and Discussion 

Machine learning methods have been used in diverse 
ways for plant disease recognition and identification. 
Different machine learning algorithms (such as random 
forest, k-nearest neighbor, and support vector machine) 
have been validated through several studies for plant 
disease diagnosis (Hang et al., 2019). In this work, we 
tried random forest, multinomial Naive Bayes algorithm 
(MNB), gradient boosting algorithms on Kaggle data and 
our own custom data. 
Performance Evaluation Metrics 

The following metrics evaluate the performance of the 
proposed model. 

Accuracy: The ratio of correctly predicted medical 
images to total images is used to calculate accuracy. 

TP +TN 


Accuacy = (6) 
TP+FP+FN+TN 


Precision: The precision of medicinal leaf images is 


determined by calculating the ratio of true positive counts 
to the total number of true positive and false positive 
medicinal leaf images. 
Bs TP 
Precision = ———— (7) 
TP + FP 

Recall: To find the recall, add up all the positive and 
negative images, then divide the total by the sum of the 
two. 


Recall = aoe. (8) 
TP + FN 


Fl-score: The Fl-Score is the ratio of true positive 
values to the sum of true positive and false positive 
values in an image collection. 

Fl—score=2* Precision* Recall (9) 
Precision+ Recall 

Where TP-True Positive, TN-True Negative, FP-False 
Positive and TP-True Positive 

Above Figure 5 is the program interface, which lets 
the user give symptoms of the disease as input. 


Figure 6 shows the accuracy of prediction with 
various algorithms. MNB gives better accuracy than 
others. 

Comparative Analysis 

The comparative analysis of the three models is 
summarized in Table 1. The Multinomial Naive Bayes 
model outperformed the Gradient Boosting and Random 
Forest models in terms of overall accuracy. 
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Figure 2. Gradient Boost Model. 


Figure 3. Random Forest Model. 
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that disease dataset 


Figure 4. Architecture Diagram. 
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Vedicinal Leaf Prediction and Disease Identification 
using Machine Learning Techniques 
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Figure 5. List of Symptoms. 
Medicinal Leaf Prediction and Disease Identification 
using Machine Learning Techniques 
Symptoms: 
ulcers_on_tongue ’ 
stomach_pain ’ 
acidity ’ 
select ’ 
Gradient Boost: Disease name: ['GERD] Accuracy: 90.48% 
Leaf name: Catnip 
Multinomial Naive Bayes: redict Disease name: [GERD] Accuracy: 92.86% 
Leaf name: Catnip 
Random_forest: Disease name: ['GERD'] Accuracy: 90.48% 
Leaf name: Catnip 


Figure 6. Prediction of disease. 


Disease Recognition Performance 


Table 1. Comparative Analysis of three ML Models However, the: pertonmance’ ef madele wal quite 


NEOVINt ES CSCO Mc SEMEN ccetags different in identifying individual diseases. The 

score Multinomial Naive Bayes model best predicted diseases 

Multinomial 91% 90% | 90.5% 92% with a clear set of symptoms. The Gradient Boosting 

Naive Bayes model performed best in modeling diseases with complex 

Gradient 87% 86% | 86.5% 88% and overlapping symptom patterns, while the Random 

Boosting Forest model outperformed others on specific symptom- 
ee 84% 83% | 83.5% 85% patterned disease models. 
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Due to the high accuracy, flexibility and robust nature 
of Multinomial Naive Bayes model, it becomes very 
useful in integrating Ayurvedic practices with modern 
technology. This provides an opportunity for less harmful 
treatments with lower side effects to act as a 
complementary force against mainstream medicine. The 
model can be integrated into mobile applications, which 
will assist users in far-flung regions to access ancient 
medicinal knowledge. 


Conclusion and Future Scope 

This paper predicts the disease suffered by humans 
based on symptoms given by using ML algorithms, 
namely Random Forest, Gradient Boost and MNB. It 
takes symptoms as input, it predicts and generates disease 
name and leaf name to cure that disease as output. In this 
paper MNB algorithm provides a high accuracy of 92%, 
compared to other algorithms, giving better prediction 
than the remaining. 


Future Scope 

# In the future, we can extend this paper by taking 
input as images of symptoms like MRI CT scans. 

# Display of leaf images can also be added as an 
output. 
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